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WmCTS OF EXAICCNEE RESPONSE CHAKGES ON 
ITEM AMD TEST CHAmCTl^STICS'^ 

Linda Crocker Jeri Benson^ 

University of Florida 

A queitioii commonly asked by examinees and test conetructorr alike is^ 
"Should the eKaminee change his responses to objective test itemi?" In attempts 
to ans'wer this q[uestion, most imrestigators in this area have focused on how 
reponse ehanges affect the total scores of individual examinees (e.g* MoMorrls 
and Leonard^ 1976} Mueller and Shwedelj 1975 | Reiling and Taylor ^ 1972 | Jacoba^ 
1972^ and Bath^ I967)* In these studies the examinees' total test score was used 
as the prima^ unit of analysis* 

Seldom has the problem been a^roaohed from the test constructor 'E point of 
vieWp Yet it might be very useful for the test constructor to know: "H©¥ do 
examinee chsmgea affect test quality?" mi '*Which test and Item charaeteristics 
are most likely to be affected by exajnlnee response changes?" To Mswer these 
questions the researcher mist look beyond the examinee^ total score to item analysis 

The pi^pose of this en^irical study was to determine tha effects of examinee 
3:*esponse changes on test and item oharact eristics for objective exMinationa. 
Specifically tKe following questions vere Investig^^ted: 

(1) How are Item difficulties affected by eKtolnee response changes? 

(2) How ajre item statistics^ such as biserlal and point biserial correlation 
eoeffioients affected by examinee response changes? 

(3) How is test reliability (i*e» Internal conslstancy) affected by examinee 
response cht^ges? 

(k) How are examinee persoml biserial correlations a 

changes? : 

^hie study was BUp^rted in p^t by the Institute for Developmtnt of Hufflan 
Resources at the University of Florida. : 

p ... _ ■ ... 

We axe grateful to Mirs, Faye Cake ^ Director of the Alachua County Teacher Education 
Center Mid Mr. William Cliett^ Assistant R-incipal of Fort Clarke Middle School 
who supported this study and provided data which were used in the analysis. 



(5) Does the use of a "DonH l&iow'^ option affect examinee item response 
changes? 

(6) VThat are the characteristics of items which have high rates of response 
changes? 

An in^ortant aspect of this study 'was to have replication aoroai different examinee 
populations and different t^es of objective examinations to test the generaliE* 
ability of the findings, . • 

IdETHOD 

R^oeeduras 

Erlor to the item analyses performed in this study^ tests were administered 
to ©xamineeB using stonda^d machine a corable answer sheets and soft -lead pencils* 
Exauninees received no special .instructions about response changes and took the 
eg^aminations as a normal pt^t of their academic prograjn* The tests were then 
scored using the exaJiiinees* final responses* The tests were re-scdred a second 
tirae using, the examinees' initial response, (A preliminary pilot test had shown 
that erasures on the answer sheet would be readlJj^ detected by visual inspection.) 
The new answer shaets were prepared by the investigators based upon student eraswes 
on the original answer sheets. In those few cases where the exarainee had made more 
than one answer; chSinge per itemj one of the erased responses was randomly selected 
to be coded as the initial response. 

An item ana^sia: was conducted on both fiats of data to yield item difficulties^ 
bieerial corralLations between item scores and total scores^ point blserial correla- 
tions and personal biserial correlations. In addition^ estimates of test internal 
consistency were coff^puted for both sets of data using the Kuder Richardson 20 procediire. 
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Sample ajid Instruments 

Answer sheets for the first study were obtained f¥om IO3 graduate and tinder- 
graduate students enrolled in an introductory cot^se in testing and measurement* 
The 35 Item test miB a regular unit examination/ based on cOiirse ohjectives. 

To test the generalisability of these results for another student population 
in a different testing situation ^ answer sheets trom 289 seventh grade students 
on the ^0 item Metropolitiui Achievement Mathematics Comprehension Subtest (MAT) 
were used* These two student populations should have been diasirnlli^ enough in 
terms of age and test wlsenesSi and the teste should have differed sufficiently 
to determine whether raEults of the study would have widespread generaliEability* 
jhe I'^T also had a "Don*t IQiow" option for each item^ which was not used on the 
classroom test for the college student g^roup. 

RESULTS 

In general the findings could he summarized as follows: 

1* Average Item difficulties showed slight positive gains due to examinee 
response changes for both saE^les, (See Tables 1 and 2*) However ^ the group 
meaji gains on total test score ware not statistically significant. Despite the 
small size of thev observed increases 5 it should be noted that p-values increased 
on 32 out of 35 items for the college examinees and on 39 out of items for the 
seventh gra4ers. 

2. In general Item discrimination statistics were relatively unaffected by 
changes in student responses* For the college examinees (see Table 3) there was 
little or no shirt in the point-blserlal correlations between items and total test 
scores or in the discrimination indexes* For the seventh-graders (see Table 2), 
the point -biserial values were equally stable* Blserial r values were also 
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examined for this sai^le and they too shoved little or no effect due to reeponse 
changes,- 

3. Internal consistencies of the two tests ^ere relatively unaff acted by 
emminee respoase chajiges (Tables 1 and a)^ inspite of the fact that the mean 
number of response changes per item for the college group was 6,9 s^d for the 
seventh grade group was 11*6* Thus, test reliability does not a^ear to be 
adversely affected when examinees cha^e their answers* 

The personal biserial index is essentialJy the biierial oorrelation used 
in item anaaysls^ applied to people instead of items (pischerj 1970)* It is the 
"biserial correlation computed across items for a perioii's item scorea (0 or l) 
and the proportion of people answering the items correctly. 

Personal biserial correlations were calculated for the coUege exajninee group 
on^ (see footnote 3)* There were no differences In median personal biserial for 
the college ex^inees from their first response (^perbis ^ *3^^ thei^ changed 
response (^p^^^j^g ^ *37). There were no observed differences in the ranges of the 
parsonal biserial for the college examinees fi'om their first response 13 - .67) 
to their changed response (",12 - ,67)* It ms noted that for those axamlnees who 
changed only a few answers (1 to U changes) personal blserials had a tendency to 
increasep Of those examinees who made no answer chMges the personal biserial was 
relatively unchajiged* The greatest shifts in personal blserials were obiervad for 
exMinees making many changes in their answers (5 to 3JL changes )3 but directionality 
of the shift ai were not consistent* 

5. On 39 out of ho items, students who originally chose the ^'Don't Know" 



Beadei*s should note that the Item statistics presented for the college examinees 
are r difficulty, point -biserietl r and Index of discrimination. For the seventh 
graders the Item statistics presented ai-e difficulty^ point-blserlal r 
biserial r* This alternation In item statistics reported was necessary becau 
of differences in the answer sheets used at the university eiid public school 
levels* Different optical scanning equipment ejid different item analysis programs 
had to be used* 

6 . .. 
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option, later changed their responses* The total frequency of "Bon't toow" first 
responses vas $kk or an average of 8*6 students per item* The total frequency 
of *'Don^t Know" responses after changes ms gS, or an average of 2.^^- students per 
item. Thus it is obvious that a major faetor in res^nse changes among the 
seventh -graders was the shift from the "Don't Kno>?" to another option on the test, 
(Further examination of item response changes revealed that students chajiged ftrom 
the "Don*t Kiiow" to the correct answer approximately one-third of the time. Since 
there vere 4 possible rssponses in addition to ^"Don't Kiiow^" it is obvious that 
students made use of partial knowledge in chooiing the correct ajiswer*) 

6, To investigate the ^aracteristics of items which had hi^ rates of 
response change^ the 10 items on each test with the greatest number of changes 
were identified* For the collage eKwninees^ these were items with an average of 
11 response changes per itemi for the seventh graders , these were items with BXi 
average of 2^ response changes per item* For these items j the following conditions 
were observed: 

© item difficulties (p) were increased slight^ 

s item discriminations were increased slightly for the college saniple 
o point blierial correlations were not affected for the college group 
# point biserial and biserial correlations were increased alight]^ for the 
seventh grade sajnple 

DISCUSSION 

In summary ^ those who construct and admtoister teats should be heartened by 
these results 5 indicating that a moderate ajnount Of response changing has no 
adverse affect on test quality. If anything^ Item diicrimlnat ions may be slightly 
inrproved when examinees chajige responseSp r 

To answer the question often raised by exajnlnees "Should 1 change my miswers?", 
the best advice seems to be that response changes improve scores more often than 

: ■-. . -7 -.. .-- 



they lower them (albeit to a very slight degree)* In this Etudy Bmong the lOo 
college examinees, 60 (57^) increased their scores ajid only 9 (8,5^) decreased 
their scores by chaiiging item ^eaponses. In the replication study of 23$ seventh 
grade examinees^ 135 examinees (^7^) increased their scores i^hile only 7 (or 2^) 
actually lost points by changing their responses. Looking at all item responses 
to the test, for the college exajriinees, 62^ of all item response changes yielded 
the correct response while 1^ of all response changes resulted in loss of the 
correct response* For the seventh graders^ 55^ S'll item response changes 
resulted in the correct answer ajid 1^ resulted in an incorrect answer^ Thus 
teachers who advise their students against changing responses may actually do 
their students a disservice. 
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TABLE 1 



Test QMdL Item Statistics Based on Responses of College acaminees 
Before and After Response Chrages 
(N ^ 106) 



Teat and Item Characteristics 


First Responaes 


, ChMiged Responses 


Mean Iteni Dlff icnlty(p) 
E^ge of Item Dif f ieulty 


.65 


.68 
.38-. 95 


Median Item-Test point -biseriaJL r 
Range of point -tiserial r values _ 


.36 
.02-.55 


.39 
.07-. 60 


Median Item Diserlmination 
Range of Item Discrimination 


.39 


.1*0 
.07-. 79 


Intarnal Consistency (KRgo) 
StajidK'd Error of Measurement 


.79 

2.53 


.80 
2.51 


Overall Test Mean 

Overall Test Standard Deviation 


22.86 
5.5^ 


23.90 
5.62 
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TABLE 2 



Test and Item Statistics Based on Kesponsea of 7th Gradi acamlnees 
Before ajid After Eesponse ChMgei 
(N =^ 289) 



Test suid Item Characteristics 


First Responses 


Changed' Rasponses 


Mean Item Difficulty (p") 
Eange of Item Difficulty 


.53 

.27-.93 


.55 
.29-.93 


Median Item-Test point-'biserial r 
Kange of point-'biserial r values 


.46 
.19-.65 


M 
.l9-,64 


Median It em -Test biserial r 
Range of "biBerial r values 


.58 
.26-. 83 


.58 
.27-.84 


Internal Consistency (iffiao) 
Standard Error of Measurement 


.90 
2.69 


.90 
2.66 


Overall Test MeMi 

Overall Test Standard Deviation 


21.4a 

8.68 


22.18 
8.73 
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